On Fast Non-metric Similarity Search by Metric Access Methods

نویسنده

  • Tomás Skopal
چکیده

The retrieval of objects from a multimedia database employs a measure which defines a similarity score for every pair of objects. The measure should effectively follow the nature of similarity, hence, it should not be limited by the triangular inequality, regarded as a restriction in similarity modeling. On the other hand, the retrieval should be as efficient (or fast) as possible. The measure is thus often restricted to a metric, because then the search can be handled by metric access methods (MAMs). In this paper we propose a general method of non-metric search by MAMs. We show the triangular inequality can be enforced for any semimetric (reflexive, non-negative and symmetric measure), resulting in a metric that preserves the original similarity orderings (retrieval effectiveness). We propose the TriGen algorithm for turning any black-box semimetric into (approximated) metric, just by use of distance distribution in a fraction of the database. The algorithm finds such a metric for which the retrieval efficiency is maximized, considering any MAM.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A multi-step strategy for approximate similarity search in image databases

Many strategies for similarity search in image databases assume a metric and quadratic form-based similarity model where an optimal lower bounding distance function exists for filtering. These strategies are mainly two-step, with the initial "filter" step based on a spatial or metric access method followed by a "refine" step employing expensive computation. Recent research on robust matching me...

متن کامل

Non-Metric Space Library Manual

This document describes a library for similarity searching. Even though the library contains a variety of metric-space access methods, our main focus is on search methods for non-metric spaces. Because there are fewer exact solutions for non-metric spaces, many of our methods give only approximate answers. Thus, the methods are evaluated in terms of efficiency-effectiveness trade-offs rather th...

متن کامل

On Fuzzy vs. Metric Similarity Search in Complex Databases

The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. For a long time, the database-oriented applications of similarity search employed the definition of similarity restricted to metric distances. Due to the metric postulates (reflexivity, non-negativity, symmetry and triangle inequality), a...

متن کامل

Online Metric Learning and Fast Similarity Search

Metric learning algorithms can provide useful distance functions for a variety of domains, and recent work has shown good accuracy for problems where the learner can access all distance constraints at once. However, in many real applications, constraints are only available incrementally, thus necessitating methods that can perform online updates to the learned metric. Existing online algorithms...

متن کامل

On M-tree Variants in Metric and Non-metric Spaces

Although there have been many metric access methods (MAMs) developed so far to solve the problem of similarity searching, there is still big need for gapping retrieval efficiency. One of the most acceptable MAMs is M-tree which meets the essential features important for large, persistent and dynamic databases. M-tree’s retrieval inefficiency is hidden in overlaps of its regions, therefore, its ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006